added why.md for the environments by yogesh1801 · Pull Request #23 · sys-intelligence/system-intelligence-benchmark

yogesh1801 · 2025-11-27T22:14:35Z

Description

This PR addresses Issue #22 by adding dedicated WHY.md files to each benchmark directory and linking them from the root README. These files explain why each benchmark matters and how it fits into the broader vision of system intelligence, following the pattern established in PR #21.

Changes

Added WHY.md to System Exam Benchmark
Added WHY.md to System Lab Benchmark
Added WHY.md to System Artifact Benchmark
Added WHY.md to System Modeling Benchmark
Added WHY.md to Cache Algorithm Benchmark
Updated root README.md to add WHY links next to each benchmark entry in the benchmark list
Added Cache Algorithm Benchmark to the root README benchmark list (was previously missing)

Testing

Verified all 5 WHY.md files exist in their respective benchmark directories
Confirmed all WHY.md links in root README.md point to correct file paths
Reviewed each WHY.md for consistency with benchmark READMEs and overall system intelligence vision
Validated markdown formatting renders correctly

Checklist

Tests pass locally (documentation-only changes)
Code follows project style guidelines
Documentation updated (this PR is documentation enhancement)

…stinguish-api-keys Distinguish the models used in the executor and evaluator

Signed-off-by: Tarek <tareknaser360@gmail.com>

…m changes

…s/sysmobench/sysmobench_core'

- Add gpt-4o model configuration to models.yaml - Fix setup_tools.py to use shutil.move instead of os.rename This resolves 'Invalid cross-device link' error when /tmp is on different filesystem

…rse_lab_bench Course Lab Benchmark: Add Instructions for Extending the Benchmark

…benchmark Adding ArtEvalBench v0.9

…adding new artifacts to ArtEvalBench

Co-authored-by: Tarek Elsayed <60650661+tareknaser@users.noreply.github.com>

…ureachable code

…ructions

Improving the "contributor's guide" and simplifying the benchmark's schema

bastoica

I'm a bit confused as why you added a new WHY.md for arteval_bench. Is this simply the existing file or did you make any edits? Thanks!

yogesh1801 · 2025-12-06T22:24:53Z

Hi @bastoica sorry for the confusion the arteval benchmark file is same, it is error from my side that it looks like a new file in commit, but it is the same
I have fixed the issue in the next commit

Signed-off-by: Yogesh <yogeshsingla481@gmail.com>

bastoica · 2025-12-06T23:01:27Z

sounds good, thanks @yogesh1801

xuafeng · 2025-12-11T23:58:49Z

@tareknaser can you help review if the new WHY.md make sense to you?

xuafeng · 2025-12-11T23:59:30Z

@Qian-Cheng-nju can you help review if the new WHY.md of SysMoBench works for you? Welcome any comments.

Updated the number of systems and their types in the benchmark description.

Qian-Cheng-nju · 2025-12-12T09:05:31Z

@Qian-Cheng-nju can you help review if the new WHY.md of SysMoBench works for you? Welcome any comments.

We recently added the ringbuffer module from Asterinas and ZooKeeper, so I updated the description of the number and types of systems. Everything looks good to me now. Thank you very much for such a detailed document!

xuafeng and others added 30 commits November 5, 2025 18:10

Rename it "System Intelligence Benchmark"

6d24e69

Init: Initialize SysMoBench benchmark integration

87db1e9

feat: Add gitigore

69f4cb5

feat: Add prototype for phase 1&2

843f031

feat: Distinguish evaluator and model API keys in env.toml

0d2b38f

feat: Add validation for required evaluator API keys

b2acaa7

doc: update README.md

ca7e72e

initial ArtEval commit

ec7b57f

Merge pull request sys-intelligence#2 from systemintelligence/feat/di…

a607e73

…stinguish-api-keys Distinguish the models used in the executor and evaluator

feat: Add test

60d30e0

featr: Add install.sh

ff96313

adding overview and contributor's guide

1799370

skeleton ArtEval agent implementation

2054314

adding sosp24 wasabi

6303aa5

docs: add arteval to main README

a5358dc

Signed-off-by: Tarek <tareknaser360@gmail.com>

feat(ci): add GH Actions workflow for running benchmarks tests

904374e

Signed-off-by: Tarek <tareknaser360@gmail.com>

feat: add issue and pull request templates

40ccf1f

Signed-off-by: Tarek <tareknaser360@gmail.com>

fix(ci): add a test for example_bench

4130c7a

Signed-off-by: Tarek <tareknaser360@gmail.com>

fix: shell scripts to be executable

3af5b70

Signed-off-by: Tarek <tareknaser360@gmail.com>

docs: update README with instructions for running a single benchmark

156c77c

Signed-off-by: Tarek <tareknaser360@gmail.com>

docs: a note on docker image arch support

a0557f9

Signed-off-by: Tarek <tareknaser360@gmail.com>

meta: add outputs directories to gitignore

868da59

Signed-off-by: Tarek <tareknaser360@gmail.com>

feat(ci): add release trigger to workflow

ea9b54d

Signed-off-by: Tarek <tareknaser360@gmail.com>

fix: Use tla_specification instead of generated_text to adapt upstrea…

5ef835f

…m changes

Merge commit '04900168e10834f3aa5eef4d13b318e1efcdac24' as 'benchmark…

c5dfbb1

…s/sysmobench/sysmobench_core'

fix: Add gpt-4o config and fix cross-device link issue in setup_tools

a68e171

- Add gpt-4o model configuration to models.yaml - Fix setup_tools.py to use shutil.move instead of os.rename This resolves 'Invalid cross-device link' error when /tmp is on different filesystem

fix: Convert GenerationOutput to GenerationResult for evaluators

984336a

docs: Update README and install script for Git Subtree integration

68025ca

feat: Add docker file

25c6af8

fix: Add env.toml

97c1c3c

xuafeng and others added 17 commits November 19, 2025 08:59

Merge pull request sys-intelligence#16 from sys-intelligence/docs_cou…

f7d3cbd

…rse_lab_bench Course Lab Benchmark: Add Instructions for Extending the Benchmark

Merge pull request sys-intelligence#15 from sys-intelligence/arteval_…

535a78f

…benchmark Adding ArtEvalBench v0.9

refactor: reorganizing and improving the step-by-step guidelines for …

e6361a4

…adding new artifacts to ArtEvalBench

fix: add a brief explanation for 'docker_env' schema field

2523106

refactor: update the new schema naming convention

e0059f3

fix: apply suggestions to WHY.md

c72230a

Co-authored-by: Tarek Elsayed <60650661+tareknaser@users.noreply.github.com>

fix: a few typos in WHY.md

77b6e7f

Co-authored-by: Tarek Elsayed <60650661+tareknaser@users.noreply.github.com>

refactor: minor formatting and style improvements

c207e59

fix: remove obsolete dependency installation instructions from README

db6a947

refactor: rework the first paragraph and fix minor text redering issues

8b05a67

fix: clean-up repository, remove unnecessary or unused scripts

07f76c0

fix: update Dockerfile and removed unused scripts

16c35b0

feature: add updated Docker image and environment bootstrapt scripts

f73cb6a

fix: few tweaks re installation and setup

44c37be

refactor: add default Docker image, rewrite agent prompt, and remove …

2a8d374

…ureachable code

fix: patch the benchmark schema file and improve Wasabi's README inst…

6409d2f

…ructions

Merge pull request sys-intelligence#21 from bastoica/main

13c5fa1

Improving the "contributor's guide" and simplifying the benchmark's schema

bastoica self-requested a review December 6, 2025 21:25

bastoica requested changes Dec 6, 2025

View reviewed changes

why md for benchmarks

3928b04

Signed-off-by: Yogesh <yogeshsingla481@gmail.com>

yogesh1801 force-pushed the pr/why-md branch from 35ff2c6 to 3928b04 Compare December 6, 2025 22:28

bastoica self-assigned this Dec 6, 2025

bastoica requested a review from xuafeng December 6, 2025 23:00

doc: Revise benchmark system count and types in WHY.md

c783e5c

Updated the number of systems and their types in the benchmark description.

tareknaser force-pushed the main branch from 57b962d to a1780ed Compare February 5, 2026 16:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

added why.md for the environments#23

added why.md for the environments#23
yogesh1801 wants to merge 98 commits intosys-intelligence:mainfrom
yogesh1801:pr/why-md

yogesh1801 commented Nov 27, 2025

Uh oh!

bastoica left a comment

Uh oh!

yogesh1801 commented Dec 6, 2025 •

edited

Loading

Uh oh!

bastoica commented Dec 6, 2025

Uh oh!

xuafeng commented Dec 11, 2025

Uh oh!

xuafeng commented Dec 11, 2025

Uh oh!

Qian-Cheng-nju commented Dec 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Comments

Conversation

yogesh1801 commented Nov 27, 2025

Description

Changes

Testing

Checklist

Uh oh!

bastoica left a comment

Choose a reason for hiding this comment

Uh oh!

yogesh1801 commented Dec 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bastoica commented Dec 6, 2025

Uh oh!

xuafeng commented Dec 11, 2025

Uh oh!

xuafeng commented Dec 11, 2025

Uh oh!

Qian-Cheng-nju commented Dec 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

yogesh1801 commented Dec 6, 2025 •

edited

Loading